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ABSTRACT 


A review of discrete pursuer-evader games and known 
solutions is presented. A method is given for obtaining a 
finite memory, near-optimal evader strategy for the three- 
meocecame, wiich greatly reduces data storage requirements 
from previous near-optimal strategies. Additionally near- 
optimal evader strategies for the four-step game are 


discussed. 
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Pee ODUCT ON 


The discrete time step pursuer-evader game was first 
described by Rufus Isaacs of the Rand Corporation in the 
early 1950's in an attempt to look at the problem of 
Mamexkineg a moving target who 1S Maneuvering so as to 
Sem round the prediction of his future position. The general 
problem, as described by Isaacs is as follows: 


A battleship in midocean is aware of an enemy bomber's 
presence, but the plane is too high for precise 

Peer ton. tlhe Ship is interested only in not being 
hit; it has no offensive means. The plane has one bomb 
and we suppose--to avoid extraneous factors--that the 
bomber's aim is excellent. The battleship knows this, 
but knows nothing about when or where the bomb will be 
dropped until after detonation. It is to maneuver so 
Senco Minimize the hit probability. . . There is a time 
lag T between the bomber's last sighting of the ship and 
detonation. Thus the bomber must aim at an anticipated 
Mearholon Of the ship . . . As simple as this problem 
aeeos Cireeumscvantially, it 1S difficult technically. 
iemeain 4a foothold, we simplified it further. We made 
Tne ocean one-dimensional and discrete. That is, we 
Supposed the battleship to be located on one of a long 
meow Ol DOINLS and at each unit of time he hops to one 
Peepolnming One, enjoying the sole choice of a right or 
eit jump. the time lag was to be an integral number n 
of time units, or--the same thing--of jumps. This is 
tantamount to saying that the bomber knows all positions 
of the battleship which precede his present one by n 
mamps or more Ref.{1]. 


The solution to the single time step game, (i.e. n=1) is 

trivial but the complexity increases greatly as the time lag 
Pmenumoer Of time steps increases. Isaacs, upon formulating 
the game, proposed pursuer and evader strategies to the two- 


step game, however the proof of the optimality of these 





Serauesies is highly complex. The complexity of the multiple 
step games arises from the fact that the evader doesn't know 
when the pursuer will attack; if he did it would be an easy 
matter for the evader to distribute himself uniformly over 
the n+1 possible positions at the time of detonation, and 
limit the pursuer to a kill probability of 1/(nt+1). 

Without knowing the time of attack the evader must attempt 

to make his position uniform at every time step and this is 
mee possible. 

The three-step pursuer-evader game is yet unsolved, 
however near-optimal strategies for both the pursuer and 
evader have been described. The best existing evader 
strategy, developed by Joseph Bram Ref.[2], involves the 
evader maintaining an infinite memory of probabilities 
corresponding to the probability of turning given the evader 
mts not turned for the last k moves. This thesis will 
investigate alternative finite evader strategies to attempt 
to lower the existing upper bound on the three-step game 
value while drastically reducing memory requirements and 
additionally look briefly at possible evader strategies in 


the four-step game. 





Ti. KNOWN SOLUTIONS AND STRATEGIES FOR PURSUER-EVADER GAMES 


vee STRUCTURE 

For uniformity, the convention and structure describded 
below will be used hereafter in the description of all 
discrete n-step pursuer-evader games. The pursuer is the 
maximizing player who by selection of time of fire and aim 
point tries to maximize the probability of killing the 
evader (a kill is achieved when the pursuer fires at the 
position the evader subsequently occupies n time steps 
later). The evader is the minimizing player, who by selec- 
tion of maneuvers along the discrete linear state space, 
attempts to minimize the probability of being killed. The 
evader's maneuvers can be described as a sequence of lefts 
and rights (L and R) with each n-bit sequence of L's and 
feeemcorresponding to one of the n+l final positions 
achievable in n steps from an arbitrary starting position as 
shown in Figure 2.1. The above-described mapping of n-bit 
Peft-right sequences to final position is symmetric under 
interchange of L's and R's (i.e. LLR corresponds to a sym- 
metric position to RRL in the three-step case). Due to this 
symmetry it is equivalent to describe the evader's maneuvers 
as a sequence of straights and turns (S and T which provides 
Pemcoguivaléent mapping in Figure 2.2. A turn signifies the 


Evader moves in the opposite direction to his previous move 








LLLL LLLR LLRR LRRL LRRR RRRR 


LLRL LRLR RLRL RLRR 
LRLL RLLR RRLL RRLR 
RLLL RRRL 
iorecure 2,1 Possible Evader Positions inn Steps. 





TSSS ise e Jc SES eu Sue sea SES. S588 
Se EE Sal sua Seles 
TS SS YAS as SSTT 
SueSis SST 
Greure 2.2 POs lole fvader Positions in terms of 


Straights and Turns. 
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Miimiicteatonmeasieonii tess he CONvinues in the same direction 
as his previous move. Any n-bit sequence of lefts and rights 
can be translated into an equivalent (n-1) bit sequence of 
straights and turns Gecrmmeeieweecomes TST). “Note that in 
general there may be several possible sequences of turns and 
straights which lead to the same final position (for n=3, 
TST, TIT, and STS all result in the evader occupying the 


position one step to the left of his original position). 


B. ONE-STEP GAME 

The single step pursuer-evader game has a simple 
solution. With oniy one time step elapsing between the 
Meecuer's time of fire anc weapon detonation the evader can 
always distribute himself uniformly over the two positions 
achievable in one step shown in Figure 2.3. The evader on 
each step can continue straight with probability (1-p) or 
mem with probability p. Since the intelligent pursuer wiil 
gimit his shot to one of the two feasible positions of the 
evader when he fires (position 1 or 2 of Figure 2.3), the 
mame can be represented graphicaliy as shown in Figure 2.4. 
The minimax solution occurs when p=0.5. The corresponding 
value of the game is 0.5. The optimal evader strategy is to 


mice ab posivion | or 2 with equal probability. 


fee LWO-S7EP GAME 
The two-step pursuer-evader game is not nearly as simple 


Metes SOluvion as the one-step game. The solution was 


deal 


mmeure 2.3 Achievable Evader Positions in One-Step Game 


ic 
U 
0 
0 
. 
z 

© Ss 1.o 

p 
Graphical Solution to the One-Step Game. 


Figure 2.4 


Ve 


mona Jy Statrcing-with the hypothesis that the evader's 
maneuver will depend only on his previous maneuver anc none 
merrer; crus che probability of continuing in the same 
Maeection as the last move is denoted by (1-0), with p being 
Mm@emeoroloali lity of moving in wne opposite cirection *o the 
previous move. The attainable positions for the evader and 
Mmm@emcorresponding probabilities under the above hypothesis 
are shown in Figure 2.5. The pursuer can be expected to 
Select the position (1, 2 or 3) with the highest associated 
probability. The evader will select p so as to minimize 
mis Maximum probability. The optimal value of p is then 


mound by solving: 


min | MAX coe Gaps Gen) 


D 
ao. Oeese 


meeontcally the solution is shown in Figure 2.6. The 
Beoeulcing solution is found by solving the quadratic p=(1-p)? 

femmerenas a root at p=(3-/5)/2 = 0.38197 .. .; this value 
is also the probability that the evader is in position 2 or 
Peon Pigure 2.5 and thus the value of the game. The proof 
that this evader strategy is optimal and that (3-/5)/2 is 

the value of the game is complex. Three different proofs are 
given by Dubins Ref.[{3], Isaacs Ref.[4] and Ferguson 
Ref.[5]. The pursuer strategies in the multi-step games 

are characterized by the non-existence of an optimal 


Strategy; the pursuer can always increase his expected 


i. 





TS Eel Ss 








2 
oh =e) p*+(1-p)p (1-p) 
fepeure 2.5 Achievable Evader Positions in Two-Step Game. 
i. 
OG, 
as 
0 
Z 
Ce 
ro) Ss 1.o 


P 


Pigure 2.6 Graphical Solution to the Two-Step Game. 
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mele probability by walting a few more time periods but he 
Poanob Walt indefinitely to fire or his payoff is zero. 

meres CONntradiction leads to strategies for the pursuer which 
mere Dayotts arbitrarily close to, but not equal to, the 
Pelue of the game. Ferguson developed such a pursuer 
strategy which confirmed that (3-/5)/2 = 0.38197 .. . was 


the value of the two-step game. 


D. THREE-STEP GAME 
As stated earlier the three-step pursuer-evader game is 
yet unsolved. The value of the three-step game has been 


bounded to: 
Oe ave 0.2890 3 


Peeepram. this section will investigate previous near- 
optimal evader strategies for the three-step game and the 
resulting upper bounds upon the game value. 
i vwankov Ryovothesis otrategy 

The Markov Hypothesis for the n-step pursuer-evader 
game is stated as follows: the probability that the evader 
meet co left or right (or, straight or turn) is dependent on 
M@emprevious n-! moves but not on any moves further in the 
past than the n-lst. This form or evader strategy makes 


intuitive sense since it does not seem likely that an 


Optimal evader strategy wiil depend upon information which 
M@eepursuecr already knows at the time of fire. The known 


optimal strategies for the one and two-step games achere to 


is 





midge liarkoy Hypounesis. In the one-step game the optimal 
evader turns or continues straight with equai probability, 
therefore independent of all previous moves. (i.e. P(S) = 
P(T) = P(L) = P(R)). In the two-step game the optimal 
evader uses a strategy where the probability of turning (or 
continuing straight) depends only upon his previous nove 
meme o)) = PILIL) = PCRIR) = 0.61803 and P(T) = P(LIR) = 
mii) = 0.38197). 

The Markov Hypothesis will now be applied to the 
three-step game. Since the evader wiil condition his next 
move upon his previous two moves, his strategy can be 
@eocribved by a 2x2 transition matrix as shown in Figure 2.7. 
The state of the evacer at any time is S$ or T since this 
state is a function of the evader's last two moves (i.e. LL 


Srean>s). In this transition matrix: 


ee P(Next state is S | Last state was S) 


is = E@lextectave is G | Last state was T). 


The four achievable positions for the evader in the three- 
step game and the associated maneuver sequences are shown in 
Figure 2.8. Let the variable W represent the final position 
Beecne evader three steps after the time of fire; from 
Figure 2.8 it can be seen We(1,2,3,4). Let the variable 
STATE represent the state (S or T) that the evader occupies 
at the time of fire. The probability that the evader 


Seemo1es any final position is a function of q_ and a5 when 
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Pee ure 2. / 
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Pieure 2.8 
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S qy '-q, 
a o9 1-45 


Markov Hypothesis Transition Matrix for 
Three-Step Game. 


red 
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Achievable Evader Positions in Three-Step Game. 
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memcittoned Wwpon his initial state. For example, given 
STATE=S, to arrive at W=1, the sequence of transitions under- 


gone must be: 
sev! eo om LOS 
Mee probability of this occurrence can be written: 
‘P(W=1| STATE=S)=(1-a,)aq, 
The remaining seven conditional probabilities are: 


P(W=2| STATE=S)=(1-q,)q,(1-q,)+(1-9,) (1-a5)* ta, (1-4, Jaz 
Ae = = = - a = Z at 
ee) STATE=5)=(1-q,)(1-q,)q,1q,(1-g,)(1-q,)+q,*(1-q,) 

P(W=4| STATE=S) =q, ° 
P(W=1|STATE=T)=(1-q,)aoq, 
WwW i= = = - - g - 
Pen=2|STATE=T) =(1 G5 )q5(1 q,)4#(1-q,)*+q,(1-q,)q. 
ie Vs = = - = = = 
mi=3| STATE=T) =(1-q,)°q.+q,(1-¢,)(1-q.,)+q59,(1-q,) 


mer, |STATE=T)=q,9,- 


At any time the pursuer may choose to fire, he knows 
Mameh of the two states (S or T) that the evader is in by 
Observing his last two moves. The optimal values of q, and 
qo under this strategy are found by solving the following 


non-linear problem: 


min | MAX {P(w=j |STATE=i)} | 


She elo. 1»J ji ule Ailes 
i=s,T 


ate 





The solution, due to Ferguson, is q, = CRS) 5° S)2)) aor do = 
ecole With a2 COrpresponding game value of 0.29423, the 
mesulting matrix of conditional probabilities is shown in 
Mable 1. Ferguson states when presenting this evader 
mmmabesy, that it 1S nol known to be optimal and in fact he 
Peiectures that no evader strategy of finite dependence is 
Speimal for the evader. The strategy of Bram presented in 
the next section will show that indeed an evader strategy of 
infinite dependence does result in a tighter bound on the 
game value. 
me iniinive Depencence Strategy 

Moenetwenonccsim GChapuer 1, the best existing evader 
strategy for the three-step game was described by Joseph 
Bram. this strategy can be described as an infinite sequence 
Semone Conditional probabilities that the evader will con- 
menue Straight given the state 5S of his previous moves. If 
the previous move by the evader was a turn, the evader is in 
State S=1, while if the previous k-1 moves have been straight 
Bie evader is in state S=k. (Note that the state space of S$ 
fomintinite). We will denote a turn by T and a straight by 
© as before. At each time step the evader continues straight 


Or turns with a probability dependent upon his state S. Let: 
Pp, = P(Straight|S=k). 


iene evader is in state k at some time n, at time nt+3 the 


meeoer can be in one of four positions described by W 


Ug 





TABLE [I 


P(W=W|STATE) for Three-Step Markov Hypothesis Strate 





eae Noy 0 105397 
eee) = 073205 
I= 1 2 3 4 
STATE 
S 16987 .29423 .28109  .25480 
fi mio eo 229423 .29123 
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previously. There are eight possible 3-bit sequences of S's 
gma T's which correspond to the four possible terminal 
Peetbtons as shown in Figure 2.8. The probabilities associa- 


ted with each position W given k are as follows: 


P(W=1|S=k)=(1-p,) p,P5 

i eee a) _ : _ oon _ 
P(W=2|S=k)=(1-p,)p,(1-p5)+(1-p,) (1-p, )? tp, (1-p, 4) Py 
eek) =(1-p 4 1-p, )pjtp, (1-p, 4) (1-p, tp, 14 (1-Pyio) 
P(W=4|S=k)=p, Pp 44 Pega 
If the evader fires at time n, at position W, when S=k, his 


expected payoff will be: 
P(W=iw|S=k) 


The upper bound on the value of the game played with this 


strategy 1s: 


MAX MAX {P(W=W|S=k)} 


k W 


The evader of course will attempt to select his infinite 
array of Pts so as to minimize the above bound which is the 
maximum payoff that the pursuer can achieve. The best set 
of P's found by Bram is delineated in Table II, while the 
resulting Pirie | Sele) is shown in Table III. The upper 

bound on the game value under this specific set of Py is is 
the maximum value found in Table III or 0.28903. In this 


eeeecey the decision to turn or continue straight has a 
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dependence upon the previous moves. That dependence may 
extend infinitely far back; thus the evader is required to 
maintain the infinite array of PLis to execute this near- 
optimal strategy. 
3. Sub-Markov Strategy 
The strategy presented here is due to Bouchoux 
ref .1 6] and is characterized by a strategy where the evader's 
sequence of moves is not Markovian in itself but one in 
which that sequence is generated by a substructure which is 
Markovian, hence the description Sub-Markov. This form of 
strategy is suggested by its use in providing optimal 
memapecies in emission-prediction games described by 
Blackwell Ref.L7 | and Matula Ref.L8]. The pursuer-evader 
fee, while similar to emission-prediction games, is compli- 
eevead by the fact that there are several distinct sequences 
of moves which lead to the possible terminal positions. 
mee the pursuer (predictor) must fire at one of those ter- 
minal points and not at a specific sequence of moves, the 
Same is more complex. Bouchoux describes a strategy based 
upon three states, A, B and C, through which the evader 
transitions in a Markovian manner. When in state A the 
evader always turns, while in states B and © he always goes 
Seraight. After each move, straight or turn, the evader 
transitions between states according to a 3x3 transition 
matrix and is ready for his next move. This strategy is 


finite in the memory required by the evader and Bouchoux 


oa 










obtained a bound on the game value of 0.28922 by optimizing 


upon the transition matrix. 
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ieee Dn eMARKOV STRATEGY 


A. MOTIVATION AND DESCRIPTION 

The evader strategy to be investigated will be called 
Extended Markov because it is an extension of the finite 
dependence of the Markov Hypothesis strategy. The depen- 
dence will be finite but will extend beyond the previous n-1 
steps. In the Markov Hypothesis strategy, for the three- 
Seep came, discussed in 11.D.1., the best strategy for the 
evader resulted in an upper bound on the game value of 
0.29423. If the dependence is restricted to only the pre- 
vious move instead of the previous two moves the best 
Strategy results in an upper bound of 0.29630 (Note: this 
is equivalent to adding the constraint q4=4, to the non- 
Smear problem described in II.D.1. with a solution at 
q4=45=2/3). Since Bram's strategy showed that the Markov 
Hypothesis was not optimal for the three-step game, it seems 
that a Markovian strategy where the dependence is finite but 
extends beyond the last n-1 moves might result in a tighter 
bound on the game value than previously obtained. This is 
the class of strategies to be called Extended Markov. These 
strategies for the three-step game, Markovian in nature, 
will arise from a dependence upon the last three or more 
moves and will be called the n-dependent strategies where n 


Memresentvs the level of dependence. In this context, the 
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Markov Hypothesis strategy for the three-step game is the 


two-dependent strategy. 


B. GENERAL N-DEPENDENT STRATEGY 

In the n-dependent strategy the evader will determine 
his next move based upon his previous n moves. The evader 
eam be thought of as controlling 2" variables, each being 
the probability of going (say) right given the previous n 
steps have been in a certain sequence. We will utilize the 
left-right symmetry of the problem by considering only paths 
where the last move is to the (say) right, resulting in only 
ee Tia bles, scaem representing the probability of going 
(say) straight given the last n steps have produced a 
certain n-1 bit sequence of straights and turns. The general 
n-dependent strategy can be described by a Markov chain 
having 20-1 states corresponding to the 20-1 aif¢ferent 
n-1 bit sequences of straights and turns which are possible 
based on the last n moves (i.e. conditioning upon the last 
n moves is equivalent to conditioning on the last n-1 
Beraights or turns). From each of the aus | states there is 
a fixed probability that the evader will maneuver to one of 


Meet OUur final positions W in the next three steps. A Za 


x gn-1 Pedioltlon Mauwraix will be used to deseribve the condi- 
meena! probability of turning or continuing straight given 
Fhe current state ((n-1)-bit sequence). Since the state 


meseertbes the previous n moves in terms of straights and 


turns only two possible transitions exist from each of the © 


As 





Seo ceo to firey N=2)bits Of the state transitioned to are 
determined by the last n-2 bits of the state transitioned 
fmeom; the last bit will be 5 or T depending upon the new 
move. Due to this structure the transition matrix will be 
completely defined by ae! variables (called qd; ole, pea 
which represent the probability of continuing straight given 
meee current state. The other transition probability for 

that state will obviously be (1-q,). Using a transition 
fieaex SO constructed, the conditional probability of 

Bnding in one of the four final positions (W=1,2,3 or 4) can 
memeound. tn order to arrive in position 1, for example, the 
sequence of states transitioned must result in the termina- 
mee three-bit sequence, T55, as can be seen from Figure 2.8. 
Thus P(W=W| STATE) is a function of the variables q, (ah Sa 
par) and the best n-dependent strategy is solved by the 


Poelowlng non-linear program: 


nin | MAX P(w=t}STars) | 
qd; W,OTATE 


S.t. O<q,<1.0 eg 2 


mor secneral n, it is seen that the above program involves 


se | n=} 


minimizing the maximum of 2 (4 positions x 2 staves) 
non-linear functions of up te p2°) comialoe Me analytic 
solution has been found and in later sections near-optimal 


solutions will be found by non-linear search techniques. 


Zui! 





fe 6 LHREE=DEPENDENT STRATEGY 

The first extension of the Markov Hypothesis strategy is 
mie three-dependent strategy described by four states (SS, ST, 
meettT) and a Zx4 transition matrix shown in Figure 3.1 


where: 

Gee P({next move is straight | State is SS) 
or equivalently; 

ae. = P(next state is SS | last state was SS) 


Mae sixteen conditional probabilities of terminating in one 
©6f the four positions W, given the evader starts from one of 
Paes tOur states are listed in Table IV. The best solution 
mound USing the three-dependent strategy gives an upper 


bound on the game value of 0.28964 when: 


gq, = 0.66163 a3 = 0.62489 


CE CUS), Oe e054 


II 
I 


G9 ou, 
Miesmatrix of conditional probabilities evaluated at this 
Meant are in fable V. This solution was found by utilizing 
an improved feasible direction search which was started from 
a known "good" solution. For the three-dependent strategy a 
good starting point is found by applying the known two- 
dependent (Markov Hypothesis) solution to the three- 


Semenacnt structure. If one applies the restriction 14=95 


and do=4) to the three-dependent strategy, it is equivalent 
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NEXT STATE 


5S ST TS U0, 
Ss qd, 1-q, 0 0 
LAST Si 0 0 d5 1-45 
STATE ve q3 1-4 0 0 
te 
rit 0 0 qd, al, 
Figure 3.1 4x4 Transition Matrix for 3-Dependent 
Strategy. 
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TABLE IV 


P(W=W|STATE) for 3-Dependent Strategy 


Notation: p, = Teele i= || Serpe’ 
P(W=1/SS) = piqoq3 
P(W=2|SS) = pyqop, + PyPoP, + 442449 
Giese D4PoG, + 44PyPo + 9494?) 
P(W=4)SS) = q,q,q, 
P(W=1|ST) = poq,a, 
P(W=2|ST) = poqg,p + PoP, P, + GoP345 
P(W=3{/ST) PoP), + GoP3Po + 4593P, 
P(W=4|ST) = 5434, 
Cw 1 | Tee P3993 
P(W=2|TS) = pidoPz + D3PoP, + 43P445 
P(W=3}TS) = pzpoq, + 43P4D5 + 930404 
P(W=4|TS) = q34,4, 
Ewe 4) |r 1) P,4) 43 
P(W=2|TT) = poq)p3 + P,D,P, + 4) P45 
P(W=3/TT) = ppp,qd, + G,P3P5 + 4,43p, 
P(W=4/TT) = g,434, 


Bie 





TABLE V 


Good Evader Strategy in 3-Dependent Case 


a4 = Pigsics) = 0.660162 
do = peor) = 0.70054 
qd3 =m o) = 0.62289 
qi, Seeesiuen)  =— 0.70052 
P > (W=w| STATE) 
= 1 Z 4 
Slee 
os eo de <2/7009 228615 ~28964 
oak ete 109 228964 28964 28964 
iS Rey) Seo. eee 1 | ad aoe ie. 
ey Bio LO9 52S Slogh 228964 228964 
ISIE a 
Good Evader Strategy in 4-Devendent Case 
cE = (S| SSS) se alesis) a5 = Eve mss) = 0 .665K0 
d5 = PA Sieol) = 0.69579 qd6 = Pro lst) = 0569579 
q3 —enS' GTS) = 0.62274 dq = Peoples) = 0.62472 
qd), Soto) = 0.09579 dg = Poesia) = 0769579 
P(W=W| STATE) 
‘ie 1 2 3 i 
SLATE 
Se) Ailes (0) eo 7 PLS BS he 228659 
Soul Hoes BPS TES eo.» LASS OAS 
Sees 5 log 2 ee 14 028465 2/7409 
Sill oe / ZOU $2092 5 26925 
Los 5 AS AS 27 O06 Bete) as eee 5 
ieowh So ee B2OT 2D RZOULS Be Pas, 
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mominemcucavecy Giccussea adn L1.D.1. with an upper bound 


Be 0.29423 when: 
oe qd = 0 nos30 7 Q5 = dy Ser 5205 


Analogously any near-optimal solution to the n-dependent 
meeavepy Will provide a "good" initial solution to the 
(n+1)-dependent strategy. While the solution given above 
for the three-dependent strategy is not known to be optimal, 
but rather a local minimum of the problem described in 
III.B., it does represent a significant improvement over the 
two-dependent strategy (0.29423) and is close in value to 
the infinite strategy of Bram (0.28903). Appendix A pre- 
sents an analysis of the above three-dependent solution and 
shows that the proposed solution does satisfy first-order 
Kuhn-Tucker conditions (necessary but not sufficient) for a 
meeea! Minimum. It is interesting to note that in the 


proposed solution G24, on ae 
Peon o-ar(o TT). 


Additionally in order for the pursuer to receive his maximum 
achievable payoff he must refrain from attacking when the 


eos 1S TS or be limited to a payoff of 0.28191. 


D. FOUR AND FIVE~DEPENDENT STRATEGIES 
The treatment of the four-dependent and five-dependent 
Strategies is equivalent to the previously described three- 


dependent strategy with the expansion of the state space and 


Ee 


momger Ob evariadbles anvoived to eight and sixteen 
respectively. Good solutions to the four and five-dependent 
Berabegiecs were found, as in the three-dependent case, by 
starting at a known near-optimal set of values for the q,'s 
mi@mecOnducting an improving feasible direction search until 
a local minimum was found. The best solutions thus found to 
the four and five-dependent strategies and the resulting 
conditional probability matricies are shown in Tables VI and 


ald 


E. CHARACTERISTICS OF THREE, FOUR AND FPIVE-DEPENDENT 

STRATEGIES 

The solutions found for the three, four and five- 
dependent strategies, outlined in Tables V, VI and VII show 
several revealing characteristics. In each case the condi- 
immemal probability of continuing straight given the n-i bit 
state is not dependent upon all of the information contained 
in that n-1 bit sequence. The probabilities are dependent 
only upon the number of time steps elapsed since the last 
Booememaneuver and not upon any turn-straight information 
further in the past than that last turn. For example, 
letting t denote the number of time steps since the last 


turn, then in the five-dependent solution: 


ee ome eet ietigs @ <i =!) 
P(S|t=2) 


SN aN 


omc P(S[t=3) 
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0 P(S|t=4) 


oe 


P(S|t>4) 


Memes hypounesized that this characteristic holds for the 
optimal form of any n-dependent strategy. If this is so it 
can be seen that the n-dependent strategy is a finite (trun- 
cated) version of the Bram strategy presented in II.D.2. and 
as the level of dependence n is increased without bound the 
memme Of 0.28903 of Bram is expected to hold. 

Each of the investigated strategies is also characterized 
Soeeavins some States in which the evader must refrain from 
ieee, cise he forfeits his ability to maximize his payoff. 
As the level of dependence increases however, the pvenalty to 
the pursuer who fires when the evader is in one of these 
states diminishes. Table III* shows that under Bram's 
strategy there is no time at which the pursuer cannot 
achieve his maximum payoff given he always fires at position 


=. . 


DO 


IV. FOUR-STEP GAME 


The four-step pursuer-evader game has been the subject 
of little interest due to the unsolved nature of the three- 
step game. We shall briefly look at the four-step game and 
discover that the apparent characteristic structure of the 
three-step extended Markov strategies does not extend to the 
four-step game. Given a four-step time delay between the 
attacker's time of fire and subsequent detonation, the evader 
may achieve five different positions through the sixteen 
different four-bit sequences of turns and straights as shown 
in Figure 4.1. The Markov Hypothesis strategy solution to 
the four-step game is due to Washburn Ref.{9]. In the four- 
step game the Markov Hypothesis has dependence extending to 
the last three moves, the best strategy under this hypothesis 
bounds the value of the game to 0.23740 or below, the q 
Values and resulting conditional probability matrix is shown 
in Table Vili. The first extended Markov strategy of the 
four-step game, the only one investigated, is the four- 
dependent strategy; in this strategy dependence reacnes back 
Demune 1ast four moves. The best solution found using the 
four-dependent strategy is shown in Table IX and provides an 
upper bound of 0.23734. While this is an improvement over 
the Markov Hypothesis solution of Washburn, the improvement 


Domvery Siieht. Additionally, no underlying characteristic 


Nee 





peor aoceoisciicsea im LEL.E. for the three-step extended 
Markov strategies is apparent from the three and four- 


@ependsn. Strategies investigated for the four-step game. 


Dal 


StS oo Teloo Tiel ESS SSSS 
oe east TT STTS 
TTTS Si USdlerces SsSTt 
STSS SSST 
Figure 4.1 Achievable Evader Positions in Four-Step Game. 
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TABLE VIII 


Markov-Hypothesis Strategy for Four-Step Game 


q, = 0.69681 qd, = 0.70169 
do = 0.69681 qZ = Bo SO YS 
P(W=W |STATE) 
J 1 2 3 l 5 
STATE 
3s paca 30 core SABI, 5 BOF ee) 
ST 5 WS 25, sole Id Ao 18lS, e257 10 ae 0) 
TS 5 leas reverie eo el ae (VAG ae 40 
TT HOSS 1 aoe 2 Sel, ee ae@ ea Se 
TABLE IX 
Three-Dependent Strategy to Four-Step Game 
a, = 0.69724 dz = 0.69728 
dn = DCS qZ = 0.69727 
a3 = 0.70466 qq = 0.70469 
en 0.69654 Ga = sik) 1) ue 
P(W=W| STATE) 

a: ° : ; ; 
Sets) Oc Pils ios SABI EME N23006 SE ersye 
SST eee 4 relzors) O)s: ceo A ibe w, Me) 5 oi wet OD 
Sas eeU> 3 ee2 6 ASC Suh ra tae Se ee 
oT 5 MG) By48, 5 ss) 1 et | ped ie fee 10,5) 
TSS OA 7 meso aoe ao | 2 S25 Oe 
io T alozs 4) lltous, Oks: wo wes DO | ENG 3) 
iS mes 2 mi coe / 52 BOSE 5 SNS: Fee Ss 
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V. CONCLUSIONS AND REMARKS 


The three-step pursuer-evader game remains unsolved. 
The investigation of the extended Markovian strategies has 
been shown to result in improved evader strategies over the 
Markov Hypothesis but is not known to provide a better 
strategy than the infinite memory strategy of Bram; in fact 
it is hypothesized that the n-dependent extended Markov 
Stravegy to the three-step game represents a finite approxi- 
Teor FO the Strategy of Bram. In this respect the results 
are not entirely disappointing in that they provide a finite 
Strategy which appears to converge rather rapidly to a 
strategy equivalent to Bram's infinite memory strategy. The 
five-dependent strategy to the three-step game relies upon 


Pe aistinct variables: 
q4 G9 25 W6 te 


which provide an upper bound 0.28910 which is reasonably 
meee vo the bound of 0.28903 provided by Bram's infinite 
strategy. The near-optimal extended Markov strategies 
presented in Tables V, VI, and VIII represent local minima to 
the non-linear programming problem discussed in III.B. 

While these can be seen to represent improvements from the 
Markov Hypothesis strategy they may not be the globally 


Minimum strategies within the extended Markov structure. As 


AQ 





fpmemileve! Of dependence in the extended Markov strategies 
inereases the mathematical complexity increases dispropor- 
muonavely; Only the apparent characteristic of these extended 
Simewey Siracvegies, discussed in IT1.B. makes them removely 
peeeraccive. 

It stili remains to be answered why the three-step game 
meomapparentiy non-Markovian in ivs optimal evader strategy 
while +he one and two-step games are Markovian. The evader 
strategy proposed by this thesis as well as the strategy 
described by Bouchoux represent abstractions from the strict 
Markov Hypothesis solution and although both strategies 
represent a lowering of the pursuer's maximum payoff, 

Meeener 1S as tight as the infinite strategy of Bram which is 
strictly non-Markovian in nature. While improved finite 
strategies may be possible by further abstraction from a 
strictly Markovian strategy, it has been conjectured that no 
m~ieGe Strategy is optimal for the evader. This is known to 
be true for the pursuer since he must observe the evader for 
an ever-increasing length of time if he wishes to achieve 
optimality (with the exception of the one-step game where 
both sides have finite optimal strategies}. Bouchoux 
Suggests that a generalization of his sub-Markov strategy, 
involving three distinct Markov states each with some fixed 
probability of generating a straight or a turn, might provide 
a tighter bound on the game value due to its further 


abstraction from a Markov behavior. However, the mathematical 
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complexity of locating optimal or near-optimal strategies 
within this framework is considerable. 

The four-step game appears even more difficult. The 
Markov Hypothesis solution is shown to be a sub-optimal 
strategy, being dominated by the three-dependent extended 
Markov strategy of Table 1X. The strategies found to the 
four-step game in Tables VIII and IX appear to preclude an 
extension of Bram's infinite strategy to the four-step game. 
The apparent dissimilarity between the known near-optimal 
evader strategies from the two to three to four-step games is 
perplexing. 

The discrete evasion game upon a two or three dimensional 
surface is another area which holds promise for future 
research. The work of Ferguson solves the two-step game for 
a special class of graphs he calls restricted n-graphs; 
however the two-step game upon more general two-dimensional 
surfaces, as well as the three-step game, are unsolved. 

The discrete pursuer-evader game, as described by Isaacs 
in 1954, was generated as a simplification of a much more 
Soupwex problem. The continuing mystery surrounding all but 
the simplest of these "simplified" games provides a wealth 


Smee pportunity and motivation for future research. 
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HEP ENDEX A 


INVESTIGATION OF THE THREE-STEP EXTENDED MARKOV STRATEGY 


In III.B., the general n-dependent extended Markov 
strategy was presented. The best solution found for the 
case n=3 is given in Table V. As stated earlier, this solu- 
tion is not known to be optimal but can be shown to satisfy 
feet irst-order Kuhn-Tucker conditions (necessary lonb | -vama ten 
sufficient) for a global minimum. 

For the three-dependent case the problem may be stated 


as follows: 


min | MAX {P(W=W|STATE)} | 
a. W, STATE 


Sotho UpWkea nae Te 2 iw 


There are sixteen separate functions (see Table IV), from 
which the maximum will be selected by the pursuer's choice 
of W acco TATE (1.e. by his selection of aim point and time 
of fire), the evader must select the q,'s Seeas GO Minimize 
this maximum payoff. Let fo, Pos eee es P46 represent the 
sixteen functions described in Table IV, then the problem 


meceomnes: 


min | MAX rae. > ne £16) 
q. W,STATE 
ale 


s.t. 0.0<q,<1.0 eee yl, 


a 





Introducing a dummy variable ds» the above non-linear 


program may be equivalently written: 


min As 
BGs = ot. = eal 
Sway fe qs Ss j 
qa; ee es. 0 1=1-4 
qs & 10516 = eye 


The structure of this problem allows some additional 


Pelcdacions to be placed upon the optimal solution; 
On cis’ 1b | eee 
emose inspection of the functions, fas Sow Gnab if: 


Ore Or 


che 


D. 


; 1,.0-q. =e, O 


then at least one of the os Will have a value of 0.0. I? 
any te then the remaining three ois associated with the 
Samesinitial state must sum to 1.0, since for any initial 


State: 
Piveiee cor LISTATE) = 1.0 


The minimum of the maximum of three non-negative numbers 
which sum to 1.0 must be at least 1/3, which is greater than 


the known upper bound on the value of the game. Therefore: 


0.0 <q, < 1.0 tears 


44 





Based upon the above characteristic of the problem the 


@onstraints; 
Ga = oe So. 0 i=1-4 


will not be binding at the optimal solution and may be 


dropped without consequence, resulting in: 


min qs 
ey ir i. eee Gre j=1-16 
qs >) One 1=1-5 


The first-order Kuhn-Tucker conditions for the above problem 
Pegquire that, at an optimal point, there exist a set of A's 


emied that: 


OL dL 

——. > poetisbeed 

= 2 0.0 
ob. OL L 

44 ods igs? Me 3A. = ol 

a, 2 Oe Me Se OO 

i= 15 j = 1-16 


where: 


These conditions may be further modified: 


45 





In the proposed near-optimal solution in Table V, seven 


of the sixteen inequality constraints are binding; that is: 


By 22> £5, = £3 = £4, = f45 = £46 = ds = 0-28964 


the remaining nine constraints are slack, it follows that: 


wee An = Ag = Ae = Aw = A = A = x = A =O a 


The proposed solution must therefore satisfy the following 


eonaitions: 


oL = S08 Mees”. ainenes ee 16. j = LOR eo sl oe 
aS \ a 


fees the substitution of the values, 


a Oo eis: fl = Gr Ua. a= 0.62489 ae C7 0054 
the five constraints on = 0.0), become a set of five 
2 
linear equations in seven unknowns (Ap hes Aas dos hay? has? 


hag) Any solution to this set of equations which also 


foerciies the condition: 
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NE ee One j=4,6,7,8,14,15516 


will satisfy the modified Kuhn-Tucker conditions. Using 
imnpacar Procramine methods, such a set of A's was found, 
thereby verifying the satisfaction of the Kuhn-Tucker condi- 
tions at the proposed three-dependent strategy of Table V. 
The near-optimal solutions to the four and five-dependent 
strategies (Tables VI and VII) could be analyzed in a 


Similar manner, 
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